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Abstract 

We investigate the problem of active learning on a given tree whose nodes are assigned 
binary labels in an adversarial way. Inspired by recent results by Guillory and Bilmes, we 
characterize (up to constant factors) the optimal placement of queries so to minimize the mis- 
takes made on the non-queried nodes. Our query selection algorithm is extremely efficient, and 
the optimal number of mistakes on the non-queried nodes is achieved by a simple and efficient 
mincut classifier. Through a simple modification of the query selection algorithm we also show 
optimality (up to constant factors) with respect to the trade-off between number of queries and 
number of mistakes on non-queried nodes. By using spanning trees, our algorithms can be ef- 
ficiently applied to general graphs, although the problem of finding optimal and efficient active 
learning algorithms for general graphs remains open. Towards this end, we provide a lower 
bound on the number of mistakes made on arbitrary graphs by any active learning algorithm 
using a number of queries which is up to a constant fraction of the graph size. 

1 Introduction 

The abundance of networked data in various application domains (web, social networks, bioin- 
formatics, etc.) motivates the development of scalable and accurate graph-based prediction algo- 
rithms. An important topic in this area is the graph binary classification problem: Given a graph 
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with unknown binary labels on its nodes, the learner receives the labels on a subset of the nodes 
(the training set) and must predict the labels on the remaining vertices. This is typically done by 
relying on some notion of label regularity depending on the graph topology, such as that nearby 
nodes are likely to be labeled similarly. Standard approaches to this problem predict with the as- 
signment of labels minimizing the induced cutsize (e.g., 0] [51), or by binarizing the assignment 
that minimizes certain real- valued extensions of the cutsize function (e.g., Ifl4l l2ll3l and references 
therein). 

In the active learning version of this problem the learner is allowed to choose the subset of 
training nodes. Similarly to standard feature-based learning, one expects active methods to provide 
a significant boost of predictive ability compared to a noninformed (e.g., random) draw of the 
training set. The following simple example provides some intuition of why this could happen 
when the labels are chosen by an adversary, which is the setting considered in this paper. Consider 
a "binary star system" of two star-shaped graphs whose centers are connected by a bridge, where 
one star is a constant fraction bigger than the other. The adversary draws two random binary labels 
and assigns the first label to all nodes of the first star graph, and the second label to all nodes of 
the second star graph. Assume that the training set size is two. If we choose the centers of the two 
stars and predict with a mincut strategy]!] we are guaranteed to make zero mistakes on all unseen 
vertices. On the other hand, if we query two nodes at random, then with constant probability both 
of them will belong to the bigger star, and all the unseen labels of the smaller star will be mistaken. 
This simple example shows that the gap between the performance of passive and active learning 
on graphs can be made arbitrarily big. 

In general, one would like to devise a strategy for placing a certain budget of queries on the 
vertices of a given graph. This should be done so as to minimize the number of mistakes made on 
the non-queried nodes by some reasonable classifier like mincut. This question has been investi- 
gated from a theoretical viewpoint by Guillory and Bilmes fl?0, and by Afshani et al. 0]|. Our work 
is related to an elegant result from [|6] which bounds the number of mistakes made by the mincut 
classifier on the worst-case assignment of labels in terms of $/\&(L). Here $ is the cutsize induced 
by the unknown labeling, and ^(L) is a function of the query (or training) set L, which depends on 
the structural properties of the (unlabeled) graph. For instance, in the above example of the binary 
system, the value of ^(L) when the query set L includes just the two centers is 1. This implies 
that for the binary system graph, Guillory and Bilmes' bound on the mincut strategy is $ mistakes 
in the worst case (note that in the above example $ < 1). Since ^(L) can be efficiently computed 
on any given graph and query set L, the learner's task might be reduced to finding a query set L 
that maximizes ^(L) given a certain query budget (size of L). Unfortunately, no feasible general 
algorithm for solving this maximization problem is known, and so one must resort to heuristic 
methods — see (61- 

In this work we investigate the active learning problem on graphs in the important special case 
of trees. We exhibit a simple iterative algorithm which, combined with a mincut classifier, is 
optimal (up to constant factors) on any given labeled tree. This holds even if the algorithm is not 
given information on the actual cutsize $. Our method is extremely efficient, requiring 0(n In Q) 
time for placing Q queries in an n-node tree, and space linear in n. As a byproduct of our analysis, 
we show that ^ can be efficiently maximized over trees to within constant factors. Hence the 

1 A mincut strategy considers all labelings consistent with the labels observed so far, and chooses among them one 
that minimizes the resulting cutsize over the whole graph. 



2 



bound mini 3>/\P(L) can be achieved efficiently. 

Another interesting question is what kind of trade-off between queries and mistakes can be 
achieved if the learner is not constrained by a given query budget. We show that a simple modifi- 
cation of our selection algorithm is able to trade-off queries and mistakes in an optimal way up to 
constant factors. 

Finally, we prove a general lower bound for predicting the labels of any given graph (not 
necessarily a tree) when the query set is up to a constant fraction of the number of vertices. Our 
lower bound establishes that the number of mistakes must then be at least a constant fraction of the 
cutsize weighted by the effective resistances. This lower bound apparently yields a contradiction to 
the results of Afshani et al. flU, who constructs the query set adaptively. This apparent contradiction 
is also obtained via a simple counterexample that we detail in Section [51 

2 Preliminaries and basic notation 

A labeled tree (T, y) is a tree T = (V, E) whose nodes V = {1, . . . , n} are assigned binary labels 
y = (y 1 ,...,y n ) E { — 1, +l} n . We measure the label regularity of (T, y) by the cutsize $t(?/) 
induced by y on T, i.e., $t(2/) — |{(^ j) £ E : yi ^ Uj}\- We consider the following active 
learning protocol: given a tree T with unknown labeling y, the learner obtains all labels in a query 
set LCV, and is then required to predict the labels of the remaining nodes V \ L. Active learning 
algorithms work in two-phases: a selection phase, where a query set of given size is constructed, 
and a prediction phase, where the algorithm receives the labels of the query set and predicts the 
labels of the remaining nodes. Note that the only labels ever observed by the algorithm are those 
in the query set. In particular, no labels are revealed during the prediction phase. 

We measure the ability of the algorithm by the number of prediction mistakes made on V \ L, 
where it is reasonable to expect this number to depend on both the uknown cutsize $t(2/) and the 
number \L\ of requested labels. A slightly different prediction measure is considered in Section l4~3l 

Given a tree T and a query set L CV, a node i E V \ L is a fork node generated by L if and 
only if there exist three distinct nodes i x , i 2 , is E L that are connected to i through edge disjoint 
paths. Let fork(L) be the set of all fork nodes generated by L. Then L + is the query set obtained 
by adding to L all the generated fork nodes, i.e., L + = L U fork(L). We say that L C V is 

0- forked iff L + = L. Note that L+ is 0-forked. That is, fork(L+) = for all LCV. 

Given a node subset S C V, we use T \ S to denote the forest obtained by removing from 
the tree T all nodes in S and all edges incident to them. Moreover, given a second tree T', we 
denote by T \ T' the forest T \ V, where V is the set of nodes of T' . Given a query set L CV, a 
hinge-tree is any connected component of T \ L + . We call connection node of a hinge-tree a node 
of L adjacent to any node of the hinge tree. We distinguish between 1-hinge and 2-hinge trees. A 

1- hinge-tree has one connection node only, whereas a 2-hinge-tree has two (note that a hinge tree 
cannot have more than two connection nodes because L + is zero-forked, see Figured). 

3 The active learning algorithm 

We now describe the two phases of our active learning algorithm. For the sake of exposition, 
we call SEL the selection phase and pred the prediction phase. SEL returns a 0-forked query set 
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Figure 1: A tree T = (V, E) whose nodes are shaded (the query set L) or white (the set V \ L ). 
The shaded nodes are also the connection nodes of the depicted hinge trees (not all hinge trees are 
contoured). The fork nodes generated by L are denoted by double circles. The thick black edges 
connect the nodes in L. 



L+ L C V of desired size, pred takes in input the query set Lf EL and the set of labels yi for all 
i E Lf EL . Then pred returns a prediction for the labels of all remaining nodes V \ Lf EL . 

In order to see the way SEL operates, we formally introduce the function This is the 
reciprocal of the ^ function introduced in [6] and mentioned in Section [Q 

Definition 1. Given a tree T = (V, E) and a set of nodes Ley, 

IVI 

m*(L) = max 



<D^vcv\l \{(i,j)eE:iEV',jeV\ V'}\ 

In words, measures the largest set of nodes not in L that share the least number of edges 

with nodes in L. From the adversary's viewpoint, \I/*(L) can be described as the largest return in 
mistakes per unit of cutsize invested. We now move on to the description of the algorithms SEL 
and pred. 

The selection algoritm SEL greedily computes a query set that minimizes \&* to within constant 
factors. To this end, SEL exploits Lemma |9] (a) (see Section |4~2|) stating that, for any fixed query 

set L, the subset V' C V maximizing -i — r is always included in a connected 

\{{i,j)eE:ieV',jeV\V}\ 

component of T \ L. Thus SEL places its queries in order to end up with a query set L+ EL such that 
the largest component of T \ L+ L is as small as possible. 

SEL operates as follows. Let L t C L be the set including the first t nodes chosen by SEL, T4 ax 
be the largest connected component of T \ Lt-i, and a(T', i) be the size (number of nodes) of the 
largest component of the forest T'\{i}, where T' is any tree. At each step t = 1,2, ... , SEL simply 
picks the node i t G T^ ax that minimizes <r(T4 ax , i) over i and sets L t = L t -\ U {i t }. During this 
iterative construction, SEL also maintains a set containing all fork nodes generated in each step by 
adding nodes i t to the sets L t _iH After the desired number of queries is reached (also counting the 
queries that would be caused by the stored fork nodes), SEL has terminated the construction of the 



In Section|6]we will see that during each step L t -i — > L t at most a single new fork node may be generated. 
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query set L SEL . The final query set Lf EL , obtained by adding all stored fork nodes to L SEL , is then 
returned. 

The Prediction Algorithm pred receives in input the labeled nodes of the 0-forked query set 
L+ L and computes a mincut assignment. Since each component of T \ L+ L is either a 1 -hinge-tree 
or a 2-hinge-tree, pred is simple to describe and is also very efficient. The algorithm predicts all 
the nodes of hinge-tree T using the same label yr- This label is chosen according to the following 
two cases: 

1. If T is a 1 -hinge-tree, then yj is set to the label of its unique connection node; 

2. If T is a 2-hinge-tree and the labels of its two connection nodes are equal, then y-j is set to 
the label of its connection nodes, otherwise y-j- is set as the label of the closer connection 
node (ties are broken arbitrarily). 

In Section [6] we show that SEL requires overall C?(|V| logQ) time and 0(|V|) memory space for 
selecting Q query nodes. Also, we will see that the total running time taken by pred for predicting 
all nodes in V \ L is linear in \ V\. 

4 Analysis 

For a given tree T, we denote by m^(L, y) the number of prediction mistakes that algorithm A 
makes on the labeled tree (T, y) when given the query set L. Introduce the function 

m A (L,K)= max m A (L,y) 

y:$ T (y)<K 

denoting the number of prediction mistakes made by A with query set L on all labeled trees 
with cutsize bounded by K . We will also find it useful to deal with the "lower bound" func- 
tion lb(L, K). This is the maximum expected number of mistakes that any prediction algorithm 
A can be forced to make on the labeled tree (T, y) when the query set is L and the cutsize is not 
larger than K. 

We show that the number of mistakes made by pred on any labeled tree when using the query 
set L+ L satisfies 

^PRED 

(Lj EL ,K) < 10 LB (L,K) 

for all query sets L C V of size up to ||L+ L |. Though neither SEL nor pred do know the actual 
cutsize of the labeled tree (T, y), the combined use of these procedures is competitive against any 
algorithm that knows the cutsize budget K beforehand. 

While this result implies the optimality (up to constant factors) of our algorithm, it does not 
relate the mistake bound to the cutsize, which is a clearly interpretable measure of the label regu- 
larity. In order to address this issue, we show that our algorithm also satisfies the bound 

^-PRED 

for all query sets L C V of size up to ||-L^ EL |. The proof of these results needs a number of 
preliminary lemmas. 
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Lemma 1. ForanytreeT = (V,E) it holds that miner (T,v) < \\V\. 

Proof. Let i E argmin wg ^cr(T, v). For the sake of contradiction, assume there exists a component 
Ti = (Vi, Ei) of T \ {i} such that \Vi\ > \V\/2. Let s be the sum of the sizes all other components. 
Since |V^| + s = \V\ — 1, we know that s < \V\/2 — 1. Now let j be the node adjacent to 
i which belongs to Vi and 7) = (Vj, Ej) be the largest component of T \ {j}. There are only 
two cases to consider: either Vj C Vi or Vj H Vi = 0. In the first case, \Vj\ < \Vi\. In the 
second case, Vj C {i} U (T \ Vi), which implies \Vj\ < 1 + s < \V\/2 < \Vi\. In both cases, 
i argmin j;e ycr(T, v), which provides the desired contradiction. □ 

Lemma 2. For all subsets L C V of the nodes of a tree T = (V, E) we have < 2\L\. 

Proof. Pick an arbitrary node of T and perform a depth-first visit of all nodes in T. This visit 
induces an ordering 71, 72, • • ■ of the connected components in T \ L based on the order of the 
nodes visited first in each component. Now let 77, 77, ... be such that each 77 is a component of 
% extended to include all nodes of L adjacent to nodes in 77 Then the ordering implies that, for 
i>2,Ti shares exactly one node (which must be a leaf) with all previously visited trees. Since in 
any tree the number of nodes of degree larger than two must be strictly smaller than the number of 
leaves, we have |fork(77)| < |Aj| where, with slight abuse of notation, we denote by fork(77) 
the set of all fork nodes in subtree T(. Also, we let Aj be the set of leaves of 77- This implies that, 
for % — 1, 2, . . . , each fork node in FORK (77) can be injectively associated with one of the |Aj| — 1 
leaves of T{ that are not shared with any of the previously visited trees. Since |fork(L)| is equal 
to the sum of |fork(77)| over all indices i, this implies that |fork(L)| < \L\. □ 

Lemma 3. Let L t ~i C L SEL be the set of the first t — 1 nodes chosen by SEL. Given any tree 
T = (V, E), the largest subtree ofT \ L t ~\ contains no more than \\V\ nodes. 

Proof. Recall that i s denotes the s-th node selected by SEL during the incremental construction 
of the query set L SEL , and that T max is the largest component of T \ L s _i. The first t steps of the 
recursive splitting procedure performed by SEL can be associated with a splitting tree T' defined 
in the following way. The internal nodes of T' are T^ ax , for s > 1. The children of are 
the connected components of T^ ax \ {i s }, i.e., the subtrees of 7^ ax created by the selection of i s . 
Hence, each leaf of T' is bijectively associated with a tree in T \ L t . 

Let T^ ol be the tree obtained from T' by deleting all leaves. Each node of T^ ol is one of the t 
subtrees split by SEL during the construction of L t . As T max is split by i t , it is a leaf in T' lol . We 
now add a second child to each internal node s of T^ ol having a single child. This second child of 
s is obtained by merging all the subtrees belonging to leaves of T' that are also children of s. Let 
T" be the resulting tree. 

We now compare the cardinality of T max to that of the subtrees associated with the leaves of 
T". Let A be the set of all leaves of T" and A add = T" \ T' aoX C A be the set of all leaves added to 
77 ol to obtain T". First of all, note that |T^ ax | is not larger than the number of nodes in any leaf of 
77 ol . This is because the selection rule of SEL ensures that T^ ax cannot be larger than any subtree 
associated with a leaf in T^ ol , since it contains no node selected before time t. In what follows, we 
write \s\ to denote the size of the forest or subtree associated with a node s of T" . We now prove 
the following claim: 

Claim. For all i e A, |T max | < \£\, and for all i e A add , |T max | - 1 < \£\. 
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Proof of Claim. The first part just follows from the observation that any i E A was split by SEL 
before time t. In order to prove the second part, pick a leaf i E A add . Let £' be its unique sibling 
in T" and let p be the parent of £ and £', also in T". Lemma [T| applied to the subtree p implies 
\£'\ < l\p\. Moreover, since \£\ + \£'\ = \p\ - 1, we obtain \£\ + 1 > \\p\ > \£'\ > |T max |, the last 
inequality using the first part of the claim. This implies |T max | — 1 < \£\, and the claim is proven. 

Let now N(A) be the number of nodes in subtrees and forests associated with the leaves of T". 
With each internal node of T" we can associate a node of L SEL which does not belong to any 
leaf in A. Moreover, the number \T" \ A| of internal nodes in T" is bigger than the number 
|A add | of internal nodes of T^ ol to which a child has been added. Since these subtrees and forests 
are all distinct, we obtain N(A) + \T" \ A| < N(A) + |A add | < \V\. Hence, using the above 
claim we can write JV(A) > (|A| - |A add |) |T max | + |A add | (|7£J - l), which implies |T max | < 
(A r (A) + |A add |)/|A| < | V|/|A|. Since each internal node of T" has at least two children, we have 
that |A| > \T"\/2 > \X lol \/2 = t/2. Hence, we can conclude that |T^ ax | <2\V\/t. □ 

4.1 Lower bounds 

We now state and prove a lower bound on the number of mistakes that any prediction algorithm 
(even knowing the cutsize budget K) makes on any given tree, when the query set L is 0-forked. 
The bound depends on the following quantity: Given a tree T(V, E), a node subset LEV and an 
integer K, the component function T(L, K) is the sum of the sizes of the K largest components 
of T \ L, or \V \ L\ if T \ L has less than K components. 

Theorem 4. For all trees T = (V, E), for all 0-forked subsets L + C V, and for all cutsize budgets 
K = 0, 1, . . . , \V\ - 1, we have that lb(L+, K) > |T(L+, K). 

Proof. We describe an adversarial strategy causing any algorithm to make at least T(L + , K)/2 
mistakes even when the cutsize budget K is known beforehand. Since L + is 0-forked, each com- 
ponent of T \ L + is a hinge-tree. Let F max be the set of the K largest hinge-trees of T \ L + , and 
E(T) be the set of all edges in E incident to at least one node of a hinge-tree T. The adversary 
creates at most one ^-edge^in each edge set E(7~i) for all 1 -hinge-trees T\ E F max , exactly one 
0-edge in each edge set E{1~2) for all 2-hinge-trees T2 E -F max , and no 0-edges in the edge set 
E{T) of any remaining hinge-tree T ^ F max . This is done as follows. By performing a depth-first 
visit of T, the adversary can always assign disagreeing labels to the two connection nodes of each 
2-hinge-tree in F max , and agreeing labels to the two connection nodes of each 2-hinge-tree not in 
-Fmax- Then, for each hinge-tree T E F m3iX , the adversary assigns a unique random label to all 
nodes of T, forcing \T\j2 mistakes in expectation. The labels of the remaining hinge- trees not in 
-Fmax are chosen in agreement with their connection nodes. □ 

Remark 1. Note that Theorem^ holds for all query sets, not only those that are 0-forked, since 
any adversarial strategy for a query set L + can force at least the same mistakes on the subset 
L C L + . Note also that it is not difficult to modify the adversarial strategy described in the proof 
of Theorem |?] in order to deal with algorithms that are allowed to adaptively choose the query 
nodes in L depending on the labels of the previously selected nodes. The adversary simply assigns 
the same label to each node in the query set and then forces, with the same method described in 

3 A 0-edge is one where yi ^ yj. 
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the proof, \T y) mistakes in expectation on the y largest hinge-trees. Thus there are at most 
two 4>-edges in each edge set E(T)for all hinge-trees T, yielding at most K (p-edges in total. The 
resulting (slightly weaker) bound is Lb(L + , K) > |T(L + , y). Theorem\7\and Corollary\8\can 
also be easily rewritten in order to extend the results in this direction. 

4.2 Upper bounds 

We now bound the total number of mistakes that PRED makes on any labeled tree when the queries 
are decided by SEL. We use Lemma Q] and [2J together with the two lemmas below, to prove that 
m PRED (L+ L , K) < 10 Lb(L, K) for all cutsize budgets K and for all node subset L C V such that 

\L\ < gl-^SELl- 

Lemma 5. For all labeled trees (T, y) and for all 0-forked query sets L + C V, the number of 
mistakes made by PRED satisfies m PRED (L + , y) < T(L + , 

Proof. As in the proof of Theorem|4] we first observe that each component of T\L + is a hinge-tree. 
Let E[T) be the set of all edges in E incident to nodes of a hinge-tree T, and F^ be the set of hinge- 
trees such that, for all T G F$, at least one edge of E(T) is a 0-edge. Since E(T) H E{V) = 
for all T,T' G T \ L + , we have that \F^\ < $t(z/)- Moreover, since for any T ^ F^ there are no 
0-edges in E(T), the nodes of T must be labeled as its connections nodes. This, together with the 
prediction rule of PRED, implies that PRED makes no mistakes over any of the hinge-trees T G" F^. 
Hence, the number of mistakes made by PRED is bounded by the sum of the sizes of all hinge-trees 
T G F^, which (by definition of T) is bounded by T (L + , $ T (?/)) . □ 

The next lemma, whose proof is a bit involved, provides the relevant properties of the compo- 
nent function T(-, •). Figure |3]helps visualizing the main ingredients of the proof. 

Lemma 6. Given a tree T = (V, E), for all node subsets L C V such that \L\ < ||L SEL | and for 
all integers k, we have: (a) T(L SEL , k) < 5T(L, k); (b) T(L SEL , 1) < T(L, 1). 

Proof. We prove part (a) by constructing, via SEL, three bijective mappings ^1,^2,^ ■ V S el — > 
Vl, where P SEL is a suitable partition of T \ L SEL , Vl is a subset of 2 V such that any S G Vl is 
all contained in a single connected component of T \ L, and the union of the domains of the three 
mappings covers the whole set T \ L SEL . The mappings [12 and fx-s are shown to satisfy, for all 
forestsffF G P SEL , 

|F| < \^(F)\, \F\ < 2|/i 2 (F)|, |F| < 2|/i 3 (F)| . 

Since each S G Vl is all contained in a connected component of T \ L, this we will enable us to 
conclude that, for each tree T' G T \ L, the forest of all trees T \ L SEL mapped (via any of these 
mappings) to any node subset of T' has at most five times the number of nodes of T' . This would 
prove the statement in (a). 

The construction of these mappings requires some auxiliary definitions. We call ^-component 
each connected component of T \ L SEL containing at least one node of L. Let i t be the t-th node 

4 In this proof, \n(A) \ denotes the number of nodes in the set (of nodes) n(A). Also, with a slight abuse of notation, 
for all forests F € "Psel, we denote by \F\ the sum of the number of nodes in all trees of F. Finally, whenever F € V S el 
contains a single tree, we refer to F as it were a tree, rather than a (singleton) forest containing only one tree. 
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selected by SEL during the incremental construction of the query set L SEL . We distinguish between 
four kinds of nodes chosen by SEL — see Figure |3]for an example. 
Node i t is: 

1 . A collision node if it belongs to L SEL PI L; 

2. a [0; 0}-node if, at time t, the tree T^ ax does not contain any node of L; 

3. a [0; > \}-node if, at time t, the tree T^ ax contains k > 1 nodes ji, . . . ,jk E L all belonging 
to the same connected component of T^ ax \ {it}', 

4. a [> 1; > l]-node ifi t ^L and, at time t, the tree T4 ax contains k > 2 nodes j\, . . . , jk E L, 
which do not belong to the same connected component of T^ ax \ {it}. 

We now turn to building the three mappings. 

/ii simply maps each tree T' E T\ L SEL that is not a ^-component to the node set of T' itself. 
This immediately implies \F\ < \fii(F)\ for all forests F (which are actually single trees) in the 
domain of H\. Mappings fi 2 and /i 3 deal with the ^-components of T \ L SEL . Let Z be the set of 
all such ^-components, and denote by Vo ;0 , V .i, and V± ; i the set of all [0; 0]-nodes, [0; > l]-nodes, 
and [> 1; > l]-nodes, respectively. Observe that < \L\. Combined with the assumption 

\L S el\ > 2|L|, this implies that | Vb;o | + |^b;i| P ms me total number of collision nodes must be 
larger than \L\; as a consequence, | Vb ; o| + | Vo ; i | > \Z\. Each node i t E Vo-i chosen by SEL splits 
the tree T^ ax into one component T it containing at least one node of L and one or more components 
all contained in a single tree T[ of T \ L. Now mapping /i 2 can be constructed incrementally in 
the following way. For each [0; > l]-node selected by SEL at time t, /i 2 sequentially maps any £- 
component generated to the set of nodes in T^ ax \ T it , the latter being just a subset of a component 
of T \ L. A future time step t' > t might feature the selection of a new [0; > 1] -node within T it , but 
mapping fi 2 would cover a different subset of such component of T \ L. Now, applying Lemma Q] 
to tree T^ x , we can see that |T^ ax \ T it \ > \T^ ax \/2. Since the selection rule of SEL guarantees 
that the number of nodes in T^ ayi is larger than the number of nodes of any ^-component, we have 
1-^1 < 2|/i 2 (F)|, for any ^-component F considered in the construction of fx 2 - 

Mapping fj, 3 maps all the remaining ^-components that are not mapped through \i 2 . Let ~ be an 
equivalence relation over Vb ; o defined as follows: i ~ j iff i is connected to j by a path containing 
only [0; 0]-nodes and nodes in V \ (L SEL U L). Let i tl , i t2 , . . . , i tk be the sequence of nodes of any 
given equivalence class [0\~, sorted according to SEL's chronological selection. Lemma |3] applied 
to tree T^ ax shows that (T^J < 2|T^ ax |/A;. Moreover, the selection rule of SEL guarantees that 
the number of nodes of cannot be smaller than the number of nodes of any ^-component. 
Hence, for each equivalence class [C]~ containing k nodes of type [0; 0], we map through /x 3 a set 
F( of k arbitrarily chosen ^-components to T^ ax . Since the size of each ^-component is < |I^J, 
we can write < k\T^ ax \ < 217^^], which implies < 2|/x 3 (F^)| for all F$ in the domain 
of /i 3 . Finally, observe that the number of ^-components that are not mapped through /i 2 cannot be 
larger than [ Vb ; oU mus me un i° n of mappings / u 2 and fj, 3 do actually map all ^-components. This, 
in turn, implies that the union of the domains of the three mappings covers the whole set T \ L SEL , 
thereby concluding the proof of part (a). 

The proof of (b) is built on the definition of collision nodes, [0; 0]-nodes, [0; > l]-nodes and 
[> 1; > l]-nodes given in part (a). Let L t C L SEL be the set of the first t nodes chosed by SEL. 
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Here, we make a further distinction within the collision and [0; > l]-nodes. We say that during the 
selection of node i t E Vo ; i, the nodes in LflT4 ax are capturedby i t . This notion of capture extends 
to collision nodes by saying that a collision node i t E L D L SEL just captures itself. We say that i t is 
an initial [0; > l]-node (resp., initial collision node) if i t is a [0; > l]-node (resp., collision node) 
such that the whole set of nodes in L captured by i t contains no nodes captured so far. See Figure 
|3]for reference. The simple observation leading to the proof of part (b) is the following. If i t is a 
[0; 0]-node, then T^ ax cannot be larger than the component of T \ L that contains T^ ax , which in 
turn cannot be larger than T(L, 1). This would already imply T(L t _i, 1) < T(L, 1). Let now i t 
be an initial [0; > l]-node and T it be the unique component of T^ ax \ {it} containing one or more 
nodes of L. Applying LemmaCQto tree r^ ax we can see that \T it | cannot be larger than |T^ ax \T it \, 
which in turn cannot be larger than Y (L, 1 ) . If at time t' > t the procedure SEL selects if E T it then 
l^maxl < \Ti t \ < 1). Hence, the maximum integer q such that T(L 9 , 1) > Y(L, 1) is bounded 
by the number of [> 1; > l]-nodes plus the number of initial [0; > l]-nodes plus the number of 
initial collision nodes. We now bound this sum as follows. The number of [> 1; > l]-nodes is 
clearly bounded by \L\ — 1. Also, any initial [0; > l]-node or initial collision node selected by SEL 
captures at least a new node in L, thereby implying that the total number of initial [0; > l]-node 
or initial collision node must be < \L\. After q — 2\L\ — 1 rounds, we are sure that the size of the 
largest tree of T^ ax is not larger than the size of the largest component of T \ L, i.e., T(L, 1) . □ 

We now put the above lemmas together to prove our main result concerning the number of 
mistakes made by PRED on the query set chosen by SEL. 

Theorem 7. For all trees T and all cut size budgets K, the number of mistakes made by PRED on 
the query set L+ EL satisfies 

m PRED (L+ L , K) < min 10 LB (L, K) . 

LQV :\L\<\\L+ L \ 

Proof. Pick any L CV such that \L\ < ^\Lf EL \. Then 

(Lem.|5j (A) (Lem.|6](a)) (Thm.H (B) 



m 



PRED 



(Lj EL ,K) < T(L+ L , K) < T(L SEL , K) < 5?(L + ,K) < 10 lb(L + , K) < 10 lb(L, K) 



Inequality (A) holds because L SEL C L+ L , and thus T \ L has connected components of smaller 
size than L SEL . In order to apply Lemma |6] (a), we need the condition \L + \ < ||L SEL |. This con- 
dition is seen to hold after combining Lemma |2] with our assumptions: \L + \ < 2\L\ < ^|-^+ EL | < 
L SEL | . Finally, inequality (B) holds because any adversarial strategy using query set L can also 



2 



be used with the larger query set L + D L. □ 

Note also that Theorem @] and Lemma |5] imply the following statement about the optimality of 
PRED over 0-forked query sets. 

Corollary 8. For all trees T, for all cutsize budgets K, and for all 0-forked query sets L + C V, 
the number of mistakes made by PRED satisfies m PRED (L + , K) < 2lb (L + , K). 

In the rest of this section we derive a more intepretable bound on m PRED (L + , y) based on the 
function \P* introduced in [6J. To this end, we prove that L SEL minimizes \P* up to constant factors, 
and thus is an optimal query set according to the analysis of flSJ. 
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For any subset V C V, let T(V, V \ V) be the number of edges between nodes of V and 
nodes of V \ V. Using this notation, we can write 



\P*(L) = max 



W'\ 



WQv\l T(V, V\V')' 

Lemma 9. For any tree T = (V, E) and any L C V the following holds. 

\v'\ 

(a) A maximizer of r( y, yVyn sm^s which is included in the node set of a single component of 
T\L; 

(b) #*(L)<T(L,1). 

Proof. Let be any maximizer of p/y^U^ . For the sake of contradiction, assume that the 
nodes of V^ ax belong to k > 2 components 71, 72, • ■ • ,% G T \ L. Let V/ C V max be the 
subset of nodes included in the node set of %, for i = 1, . . . , k. Then \V'\ = J2i<k 1^71 anc ^ 
r(r,y\V) = Zi< k nV/,V\V>). Now let <• = argmax i < fe |^|/r(^,F\^)- Since 
(Si a «) / (Si M — max i for all Oj, bi > 0, we immediately obtain ^(V-*) > ^(V^J, 
contradicting our assumption. This proves (a). Part (b) is an immediate consequence of (a). □ 



Lemma 10. For any tree T = (V, E) and any 0-forked subset L + C V we have T(L + , 1) < 
2^*(L+). 

Proof. Let T ma , x be the largest component of T \ L + and be its node set. Since L + is a 
0-forked query set, TJnax must be either a 1 -hinge-tree or a 2-hinge-tree. Since the only edges 
that connect a hinge-tree to external nodes are the edges leading to connection nodes, we find that 
r(y max , V \ Vm ax ) < 2. We can now write 

ty*(L + ) - l^'l > l^maxl y I Vmaxl _ ^ (L + , 1) 

1 1 ~ ^™v\L+ r(v, v\v)- r(K nax , v \ v max ) - 2 2 

thereby concluding the proof. □ 

Lemma 11. For any tree T = (V, E) and any subset L CV we have ^*(L + ) < \P*(L). 

Proof. Let V^ ax be any set maximizing Since V max G V \ L + , V^ ax cannot contain any 

node of L C L + . Hence 

which concludes the proof. □ 

We now put together the previous lemmas to show that the query set L SEL minimizes v&* up to 
constant factors. 

Theorem 12. For any tree T = (V, E) we have ^*(L SEL ) < min 2^*(L). 

LCV:\L\<l\L SEL \ 
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Proof. Let L be a query set such that \L\ < |L SEL |/4. Then we have the following chain of 
inequalities: 

(Lemma|9](b)) (Lemma[6](b)) (LemmaQo) (LemmaQT) 

V*(L SEL ) < T(L SBL ,1) < T(L + ,1) < 2^*(L+) < 2**(L). 

In order to apply Lemma 0(b), we need the condition \L + \ < ||L SEL |. This condition holds 
because, by Lemma[2l \L + \ < 2\L\ < ||L SEL |. □ 

Finally, as promised, the following corollary contains an interpretable mistake bound for pred 
run with a query set returned by SEL. 

Corollary 13. For any labeled tree (T, y), the number of mistakes made by pred when run with 
query set L+ L satisfies 

m PRED (L+ L , y) < 4 min #*(L) $ r (y) . 

LCV:|£|<i|L+.| 

Proof. Observe that pred assigns labels to nodes in V\Lf EL so as to minimize the resulting cutsize 
given the labels in the query set L+ EL . We can then invoke [6, Lemma 1], which bounds the number 
of mistakes made by the mincut strategy in terms of the functions \P* and the cutsize. This yields 

(6] Lemma 1] (A) (Theorem [TH 

m PRED (L+ L , y) < 2 **(L+ L ) $ T (y) < 2 ^*(L SEL ) <S> T (y) < 4 **(L) $ T (y) . 

Inequality (A) holds because L SEL C L+ L , and thus T \ L+ L has connected components of smaller 
size than L SEL . In order to apply Theorem [T2l we need the conditon \L\ < \\L sm \, which follows 
from a simple combination of Lemma |2] and our assumptions: \L\ < ^\Lf EL \ < ||L SEL |. □ 

Remark 2. A mincut algorithm exists which efficiently predicts even when the query set L is not 
0-forked ( thereby gaining a factor of 2 in the cardinality of the competing query sets L - see 
Theorem [7] and Corollary |73l) . This algorithm is a "batch " variant of the TreeOpt algorithm 
analyzed in H^j. The algorithm can be implemented in such a way that the total time for predicting 
\V\ - \L\ labels is 0(\V\). 



4.3 Automatic calibration of the number of queries 

A key aspect to the query selection task is deciding when to stop asking queries. Since the more 
queries are asked the less mistakes are made afterwards, a reasonable way to deal with this trade- 
off is to minimize the number of queries issued during the selection phase plus the number of 
mistakes made during the prediction phase. For a given pair A = (S, P) of prediction and selection 
algorithms, we denote by [q + w\a the sum of queries made by S and prediction mistakes made by 
P. Similarly to tua introduced in Sectional \q + w\a has to scale with the cutsize $t(?/) of the 
labeled tree (T, y) under consideration. 

As a simple example of computing [q + w\a, consider a line graph T = (V, E). Since each 
query set on T is 0-forked, Theorem @]and Corollary |8]ensure that an optimal strategy for selecting 
the queries in T is choosing a sequence of nodes such that the distance between any pair of neighbor 
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nodes in L is equal. The total number of mistakes that can be forced onV\L is, up to a constant 



Minimizing the above expression over \L\ clearly requires knowledge of & T (y), which is typically 
unavailable. In this section we investigate a method for choosing the number of queries when the 
labeling is known to be sufficiently regular, that is when a bound K is known on the cutsize &r(y) 



We now show that when a bound K on the cutsize is known, a simple modification of SEL(we 
call it SEL*) exists which optimizes the [q + w\a criterion. This means that the combination of 
SEL* and PRED can trade-off optimally (up to constant factors) queries against mistakes. 

Given a selection algorithm S and a prediction algorithm P, define [q + m] (s,p) by 



where Ls(q) is the query set output by S given query budget Q, and m P (L S {Q) ) K) is the maxi- 
mum number of mistakes made by P with query set Ls<q\ on any labeling y with $t(2/) < K 
— see definition in Sectional Define also [q + m] 0PT = mi s ,p[q + m](s : p), where OPT = (S*, P*} 
is an optimal pair of selection and prediction algorithms. If SEL knows the size of the query 
set L* selected by S*, so that SEL can choose a query budget Q = 8\L*\, then a direct applica- 
tion of Theorem |7J guarantees that \Lf EL \ + m PRED (L+ L , K) < 10 [q + m] 0PT . We now show that 
SEL*, the announced modification of SEL, can efficiently search for a query set size Q such that 
Q + m PRED (L^ EL ^ Q y K) = 0[[q + m] 0P1 ) when only K, rather than \L*\, is known. In fact, The- 

orem[4]and Corollary |8] ensure that m PRED (Lf EL , K) = 6(T(L+ L , K)). When K is given as side 
information, SEL* can operate as follows. For each t < \V\, the algorithm builds the query set 
and computes T(L+, K). Then it finds the smallest value t* minimizing t + T(Lf, K) over all 
t < \V\, and selects L SEL + = L t *. We stress that the above is only possible because the algorithm 
can estimate within constant factors its own future mistake bound (Theorem |4] and Corollary |8]), 
and because the combination of SEL and PRED is competitive against all query sets whose size is a 
constant fraction of |L^" EL | — see Theorem [7] Putting together, we have shown the following result. 

Theorem 14. For all trees (T, y), for all cutsize budgets K, and for all labelings y such that 
&r(y) < K, the combination of SEL* and PRED achieves |L S el*| + m PRED (Lf EL ^, K) = 0([q + 
m] 0PJ ) when K is given to SEL* as input. 

Just to give a few simple examples of how SEL* works, consider a star graph. It is not difficult 
to see that in this case t* — 1 independent of K, i.e., SEL* always selects the center of the star, 
which is intuitively the optimal choice. If T is the "binary system" mentioned in the introduction, 
then t* — 2 and SEL* always selects the centers of the two stars, again independent of K . At the 

5 In HI a labeling y of a graph G is said to be a-balanced if, after the elimination of all 0-edges, each connected 
component of G is not smaller than a\V\ for some known constant a e (0, 1). In the case of labeled trees, the a- 
balancing condition is stronger than our regularity assumption. This is because any a-balanced labeling y implies 
&t(v) < \ ja,— \. In fact, getting back to the line graph example, we immediately see that, if y is a-balanced, then 
the optimal number of queries \L\ is order of \/\V\(T/a — 1), which is also inf^fe + in] a- 




(1) 




[q + m] {s , P ) = mm 



(Q + m P (L S (Q),K)) 
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other extreme, if T is a line graph, then SEL* picks the query nodes in such a way that the distance 
between two consecutive nodes of L in T is (up to a constant factor) equal to <J\V\/K. Hence 
\L\ = e(y/\V\K), which is the minimum of © over \L\ when $t(j/) < K. 

5 On the prediction of general graphs 

In this section we provide a general lower bound for prediction on arbitrary labeled graphs (G, y). 
We then contrast this lower bound to some results contained in Afshani et al. HI. 

Let be the sum of the effective resistances (see, e.g., f[T2l ) on the 0-edges of G — (V, E). 

The theorem below shows that any prediction algorithm using any query set L such that \L\ < j\V\ 
makes at least order of &c>(y) mistakes. This lower bound holds even if the algorithm is allowed 
to use a randomized adaptive strategy for choosing the query set L, that is, a randomized strategy 
where the next node of the query set is chosen after receiving the labels of all previously chosen 
nodes. 

Theorem 15. Given a labeled graph (G, y), for all K < \V\/2, there exists a randomized la- 
beling strategy such that for all prediction algorithms A choosing a query set of size \L\ < \\V\ 
via a possibly randomized adaptive strategy, the expected number of mistakes made by A on the 
remaining nodes V \L is at least K/4, while ^c(y) < K. 

The above lower bound (whose proof is omitted) appears to contradict an argument by Afshani 
et al. [HI Section 5]. This argument establishes that for any e > there exists a randomized 
algorithm using at most K ]n(3/e)+K\n(\V\/K) + 0(K) queries on any given graph G = (V, E) 
with cutsize K, and making at most e\V\ mistakes on the remaining vertices. This contradiction 
is easily obtained through the following simple counterexample: assume G is a line graph where 
all node labels are +1 but for K = o(\V\/la | V|) randomly chosen nodes, which are also given 
random labels. For all e = o(MA, the above argument implies that order of A' In \ V\ = o(\V\) 
queries are sufficient to make at most e\V\ = o(K) mistakes on the remaining nodes, among which 
Cl(K) have random labels — which is clearly impossible. 

6 Efficient Implementation 

In this section we describe an efficient implementation of SEL and pred. We will show that the 
total time needed for selecting Q queries is C?(|V| logQ), the total time for predicting |V| — Q 
nodes is 0(| V|), and that the overall memory space is again 0(| V|). 

In order to locate the largest subtree of T \ L t -\, the algorithm maintains a priority deque f[TT| 
D containing at most Q items. This data-structure enables to find and eliminate the item with the 
smallest (resp., largest) key in time 0(1) (resp., time 0(\og Q)). In addition, the insertion of a new 
element takes time C(logQ). 

Each item in D has two records: a reference to a node in T and the priority key associated 
with that node. Just before the selection of the0 t-th query node i t , the Q references point to nodes 

6 If t = 1 the priority deque D is empty. 
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contained in the Q largest subtrees in T \ Lt-i, while the corresponding keys are the sizes of such 
subtrees. Hence at time t the item top of D having the largest key points to a node in T4 ax . 

First, during an initialization step, SEL creates, for each edge G E, a directed edge 
from i to j and the twin directed edge [j, i] from j to i. During the construction of L SEL the 
algorithm also stores and maintains the current size a(D) of D, i.e., the total number of items 
contained in D. We first describe the way SEL finds node i t in T^ ax . Then we will see how SEL 
can efficiently augment the query set L SEL to obtain L+ EL . 

Starting from the node r of T^ ax referred to by0 D, SEL performs a depth-first visit of T^ ax , 
followed by the elimination of the item with the largest key in D. For the sake of simplicity, 
consider T^ ax as rooted at node r. Given any edge we let Tj and I) be the two subtrees 

obtained from T^ ax after removing edge where Tj contains node i, and Tj contains node j. 

During each backtracking step of the depth-first visit from a node i to a node j, SEL stores the 
number of nodes |Tj| contained in Tj. This number gets associated with Observe that this 
task can be accomplished very efficiently, since |Tj| is equal to 1 plus the number of nodes of the 
union of T c n) over all children c(i) of i. These numbers can be recursively calculated by summing 
the size values that SEL associates with all direct edges [i, c(i)] in the previous backtracking steps. 
Just after storing the value \Ti\, the algorithm also stores \Tj\ = \T^ lSLX \ — |Tj] and associates this 
value with the twin directed edge The size of T^ iax is then stored in D as the key record of 
the pointer to node r. 

It is now important to observe that the quantity a(T^ ax ,i) used by SEL (see Section [3]) is 
simply the largest key associated with the directed edges over all j such that is an 
edge of T^ ax . Hence, a new depth-first visit is enough to find in time C(|T4 ax |) the t-th node 
it = arg mini 6 T* o'(7^ ai , i) selected by SEL. Let N(i t ) be the set of all nodes adjacent to node i t 
in T^ ax . For all nodes %' G N(i t ), SEL compares |Tj/| to the smallest key bottom stored in D. We 
have three cases: 

1- If \Tv | < bottom and <r(D) > Q — t then the algorithm does nothing, since Ty (or subtrees 
thereof) will never be largest in the subsequent steps of the construction of L SEL , i.e., there 
will not exist any node v with t' > t such that v G Ty . 

2. If | TV | < bottom and <j{D) < Q — t, or if |7V| > bottom and <j{D) < Q then SEL inserts 
a pointer to i' together with the associated key |Tj/|. Note that, since D is not full (i.e., 
<t(D) < Q), the algorithm need not eliminate any item in D. 

3. If |Tj/| > bottom and cr(D) = Q then SEL eliminates from D the item having the smallest 
key, and inserts a pointer to i', together with the associated key |7V |. 

Finally, SEL eliminates node i t and all edges (both undirected and directed) incident to it. Note that 
this elimination implies that we can easily perform a depth-first visit within T^ ax for each s < Q, 
since T^ ax is always completely disconnected from the rest of the tree T. 

In order to turn L SEL into Lf EL , the algorithm proceeds incrementally, using a technique bor- 
rowed from |]71 . Just after the selection of the first node i±, a depth-first visit starting from ii is 
performed. During each backtracking step of this visit, the algorithm associates with each edge 
the closer node to i\ between the two nodes i and j. In other words, SEL assigns a direc- 
tion to each undirected edge (i,j) so as to be able to efficiently find the path connecting each 

7 In the initial step t = 1 (i.e., when Tf nax = T) node r can be chosen arbitrarily . 
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given node i to i\. When the t-th node i t is selected, SEL follows these edge directions from i t 
towards %\. Let us denote by ir(i,j) the path connecting node i to node j. During the traversal of 
7r(ii, it), the algorithm assigns a special mark to each visited node, until the algorithm reaches the 
first node j E Tv(ii,i t ) which has already been marked. Let 7](i, L) be the maximum number of 
edge disjoint paths connecting i to nodes in the query set L. Observe that all nodes i for which 
r](i, L t ) > r](i, L t _i) must necessarily belong to ir(i t ,j). We have rj(i t , L t ) = 1, and rj(i, L t ) = 2, 
for all internal nodes i in the path ir(i t ,j). Hence, j is the unique node that we may need to add 
as a new fork node (if j £ FORK(L t _!)). In fact, j is the unique node such that the number of 
edge-disjoint paths connecting it to query nodes may increase, and be actually larger than 2. 

Therefore if j E L^_ 1 we need not add any fork node during the incremental construction of 
L+ L . On the other hand, if j ^ L^_ t then r/(i, L t -i) = 2, which implies r](i, L t ) = 3. This is the 
case when SEL views j as new fork node to be added to the query set L SEL under consideration. 

In order to bound the total time required by SEL for selecting Q nodes, we rely on Lemma |3] 
showing that |T^ ax | < 2\ V\/t. The two depth-first visits performed for each node i t take C(|T4 ax |) 
steps. Hence theoverall running time spent on the depth-first visits is 0(J2t<Q — C(|^|k>gQ)- 

The total time spent for incrementally finding the fork nodes of L SEL is linear in the number of nodes 
marked by the algorithm, which is equal to |V|. Finally, handling the priority deque D takes |V| 
times the worst-case time for eliminating an item with the smallest (or largest) key or adding a new 
item. This is again C?(|V| \ogQ). 

We now turn to the implementation of the prediction phase, pred operates in two phases. In 
the first phase, the algorithm performs a depth-first visit of each hinge-tree T, starting from each 
connection node (thereby visiting the nodes of all 1 -hinge-tree once, and the nodes of all 2-hinge- 
tree twice). During these visits, we add to the nodes a tag containing (i) the label of node if from 
which the depth-first visit started, and (ii) the distance between if and the currently visited node. 
In the second phase, we perform a second depth-first visit, this time on the whole tree T. During 
this visit, we predict each node i E V\L with the label coupled with smaller distance stored in the 
tags oi@ i. The total time of these visits is linear in [V| since each node of T gets visited at most 3 
times. 

7 Conclusions and ongoing work 

The results proven in this paper characterize, up to constant factors, the optimal algorithms for 
adversarial active learning on trees in two main settings. In the first setting the goal is to minimize 
the number of mistakes on the non-queried vertices under a certain query budget. In the second 
setting the goal is to minimize the sum of queries and mistakes under no restriction on the number 
of queries. 

An important open question is the extension of our results to the general case of active learning 
on graphs. While a direct characterization of optimality on general graphs is likely to require new 
analytical tools, an alternative line of attack is reducing the graph learning problem to the tree 
learning problem via the use of spanning trees. Certain types of spanning trees, such as random 
spanning trees, are known to summarize well the graph structure relevant to passive learning — see, 
e.g., B71[8l[T3l. In the case of active learning, however, we want good query sets on the graph to 

8 If i belongs to a 1 -hinge-tree, we simply predict yi with the unique label stored in the tag. 
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correspond to good query sets on the spanning tree, and random spanning trees may fail to do so 
in simple cases. For example, consider a set of m cliques connected through bridges, so that each 
clique is connected to, say, k other cliques. The breadth-first spanning tree of this graph is a set of 
connected stars. This tree clearly reveals a query set (the star centers) which is good for regular 
labelings (cfr., the binary system example of Section [TJ). On the other hand, for certain choices 
of m and k a random spanning tree has a good probability of hiding the clustered nature of the 
original graph, thus leading to the selection of bad query sets. 

In order to gain intuition about this phenomenon, we are currently running experiments on 
various real- world graphs using different types of spanning trees, where we measure the number of 
mistakes made by our algorithm (for various choices of the budget size) against common baselines. 

We also believe that an extension to general graphs of our algorithm does actually exist. How- 
ever, the complexity of the methods employed in [|6) suggests that techniques based on minimizing 
\&* on general graphs are computationally very expensive. 

Finally, it would be interesting to combine active learning techniques on the nodes of a graph 
with those for predicting links (e.g., [|9] [TOP . 
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Figure 2: The SEL algorithm at work. The upper pane shows the initial tree T = (in the 
box tagged with "1"), and the subsequent subtrees T^ iax , T^ ax , X^ ax , and T^ ax . The left pane also 
shows the nodes selected by SEL in chronological order. The four lower panes show the connected 
components of T \ L t resulting from this selection. Observe that at the end of round 3, SEL detects 
the generation of fork node 3'. This node gets stored, and is added to L SEL at the end of the selection 
process. 
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Figure 3: The upper pane illustrates the different kinds of nodes chosen by SEL. Numbers in the 
square tags indicate the first six subtrees T^ ax , and their associated nodes i t , selected by SEL. Node 
z'i is a [> 1; > l]-node, i 2 is an initial [0; > l]-node, 2 3 is a (noninitial) [0; > l]-node, i A is an initial 
collision node, i 5 is a (noninitial) collision node, and i 6 is a [0; 0]-node. As in Figure |2l we denote 
by 3' the fork node generated by the inclusion of into L SEL . Note that node i$ may be chosen 
arbitrarily among the four nodes in X^ ax \ z 4 . The two black nodes are the set of nodes we are 
competing against, i.e., the nodes in the query set L. Forest T \ L is made up of one large subtree 
and two small subtrees. In the lower panes we illustrate some steps of the proof of Lemma [6] 
with reference to the upper pane. Time t = 2: Trees T^ ax and T i2 are shown. As explained in the 
proof, |Tj 2 1 < |T^ ax \ T i2 \. The circled black node is captured by i 2 . The nodes of tree T^ ax \ T i2 
are shaded, and can be used for mapping any ^-component through /i 2 . Time t = 3: Trees T^ ax 
and Tj 3 are shown. Again, one can easily verify that |T i3 | < |T^ ax \ T iz \. As before, the nodes 
of T^ ax \ T i3 are shaded, and can be used for mapping any (^-component via /i 2 . The reader can 
see that, according to the injectivity of /i 2 , these grey nodes are well separated from the ones in 
T^ax \ T i2 . Time t = 4: T^ ax and the initial collision node i 4 are depicted. The latter is enclosed 
in a circled black node since it captures itself. Time t — 5, 6: We depicted trees T^ ax and T^ ax , 
together with nodes i 5 and i 6 . Node z 5 is a collision node, which is not initial since it was already 
captured by the [0; > l]-node i 2 . Node i 6 is a [0; 0] node, so that the whole tree T^ ax is completely 
included in a component (the largest, in this case) of T \ L. Tree T^ ax can be used for mapping via 
fx 3 any (^-component. The resulting forest T \ L 6 includes several single-node trees and one two- 
node tree. If i 6 is the last node selected by L SEL , then each component of T \ L 6 can be exploited 
by mapping fix, since in this specific case none q|(}hese components contains nodes of L, i.e., there 
are no ^-components left. 



